11 research outputs found
Influence mining from unstructured big data
A crucial component of any intelligent system is to understand and predict the behavior of its users. A correct model of user's behavior enables the system to perform effectively to better serve the user's need. While much work has been done on user behavior modeling based on historical activity data, little attention has been paid to how external factors influence the user behavior, which is clearly important for improving an intelligent system. The influence of external factors on user behavior is mostly reflected in two different ways: 1) through significant growth of users' thirst about information related to external factors (e.g., the user may conduct many searches related to a popular event or related to some community of interest), and 2) through user-generated content that are directly/indirectly related to the external factors (e.g. the user may tweet about a particular event). To capture these two aspects of user behavior, I introduce Influence Models for both Information Thirst and Content Generation, sequentially, in this thesis. To the best of my knowledge, Influence models for Information Thirst and Content Generation have not been studied before.
The thesis starts with the introduction of a new data mining problem, i.e., how to mine the influence of real world events on users' information thirst, which is important both for social science research and for designing better search engines for users. I solve this mining problem by proposing computational measures that quantify the influence of an event on a query to identify triggered queries and then, proposing a novel extension of Hawkes process to model the evolutionary trend of the influence of an event on search queries. Evaluation results using news articles and search log data show that the proposed approach is effective for identification of queries triggered by events reported in news articles and characterization of the influence trend over time.
This influence model assumes that each event poses its influence independently. This assumption is unrealistic as there are many correlated events in the real world which influence each other and thus, would influence the user search behavior jointly rather than independently. To relax this assumption, in the next part of my thesis, I propose a Joint Influence Model based on the Multivariate Hawkes Process which captures the interdependence among multiple events in terms of their influence. Experimental study shows that the Joint Influence Model achieves higher accuracy than the independent model.
The second way to observe external influence on user behavior is to analyze user-generated content that is directly/indirectly related to those external factors, which I discuss in the last part of the thesis. For example, user-generated content is often significantly influenced by the community to which the user belongs to. While some work has been done on mining such influence from structured information networks, little attention has been paid on how to mine community-influence from user generated unstructured data. To study such influence, I introduce the problem of mining community-influence from user-generated unstructured contents, particularly in the context of text content generation. Although text generation has recently become a popular research topic after the surge of deep learning techniques, existing methods do not consider community-influence factor into the generation process and thus, the processes do not evolve over time. This clearly limits their application on text stream data as most text stream data often evolve over time showing distinct patterns corresponding to the shifting interests of the target community. To address this limitation, I propose an Influenced Text Generation (ITG) Process that can capture this evolution of text generation process corresponding to evolving community-influence over time. ITG is based on deep learning architecture and uses LSTM cells within the hidden layers of a recurrent neural network. Experimental results with six independent text stream data comprised of conference paper titles show that the proposed ITG method is really effective in capturing the influences of different research communities on paper titles generated by the researchers
TELeR: A General Taxonomy of LLM Prompts for Benchmarking Complex Tasks
While LLMs have shown great success in understanding and generating text in
traditional conversational settings, their potential for performing ill-defined
complex tasks is largely under-studied. Indeed, we are yet to conduct
comprehensive benchmarking studies with multiple LLMs that are exclusively
focused on a complex task. However, conducting such benchmarking studies is
challenging because of the large variations in LLMs' performance when different
prompt types/styles are used and different degrees of detail are provided in
the prompts. To address this issue, the paper proposes a general taxonomy that
can be used to design prompts with specific properties in order to perform a
wide range of complex tasks. This taxonomy will allow future benchmarking
studies to report the specific categories of prompts used as part of the study,
enabling meaningful comparisons across different studies. Also, by establishing
a common standard through this taxonomy, researchers will be able to draw more
accurate conclusions about LLMs' performance on a specific complex task
Joint Upper & Lower Bound Normalization for IR Evaluation
In this paper, we present a novel perspective towards IR evaluation by
proposing a new family of evaluation metrics where the existing popular metrics
(e.g., nDCG, MAP) are customized by introducing a query-specific lower-bound
(LB) normalization term. While original nDCG, MAP etc. metrics are normalized
in terms of their upper bounds based on an ideal ranked list, a corresponding
LB normalization for them has not yet been studied. Specifically, we introduce
two different variants of the proposed LB normalization, where the lower bound
is estimated from a randomized ranking of the corresponding documents present
in the evaluation set. We next conducted two case-studies by instantiating the
new framework for two popular IR evaluation metric (with two variants, e.g.,
DCG_UL_V1,2 and MSP_UL_V1,2 ) and then comparing against the traditional metric
without the proposed LB normalization. Experiments on two different data-sets
with eight Learning-to-Rank (LETOR) methods demonstrate the following
properties of the new LB normalized metric: 1) Statistically significant
differences (between two methods) in terms of original metric no longer remain
statistically significant in terms of Upper Lower (UL) Bound normalized version
and vice-versa, especially for uninformative query-sets. 2) When compared
against the original metric, our proposed UL normalized metrics demonstrate
higher Discriminatory Power and better Consistency across different data-sets.
These findings suggest that the IR community should consider UL normalization
seriously when computing nDCG and MAP and more in-depth study of UL
normalization for general IR evaluation is warranted.Comment: 26 pages, 3 figure
FaNS: a Facet-based Narrative Similarity Metric
Similar Narrative Retrieval is a crucial task since narratives are essential
for explaining and understanding events, and multiple related narratives often
help to create a holistic view of the event of interest. To accurately identify
semantically similar narratives, this paper proposes a novel narrative
similarity metric called Facet-based Narrative Similarity (FaNS), based on the
classic 5W1H facets (Who, What, When, Where, Why, and How), which are extracted
by leveraging the state-of-the-art Large Language Models (LLMs). Unlike
existing similarity metrics that only focus on overall lexical/semantic match,
FaNS provides a more granular matching along six different facets independently
and then combines them. To evaluate FaNS, we created a comprehensive dataset by
collecting narratives from AllSides, a third-party news portal. Experimental
results demonstrate that the FaNS metric exhibits a higher correlation (37\%
higher) than traditional text similarity metrics that directly measure the
lexical/semantic match between narratives, demonstrating its effectiveness in
comparing the finer details between a pair of narratives
Redundancy Aware Multi-Reference Based Gainwise Evaluation of Extractive Summarization
While very popular for evaluating extractive summarization task, the ROUGE
metric has long been criticized for its lack of semantic awareness and its
ignorance about the ranking quality of the summarizer. Thanks to previous
research that has addressed these issues by proposing a gain-based automated
metric called Sem-nCG, which is both rank and semantic aware. However, Sem-nCG
does not consider the amount of redundancy present in a model-generated summary
and currently does not support evaluation with multiple reference summaries.
Unfortunately, addressing both these limitations simultaneously is not trivial.
Therefore, in this paper, we propose a redundancy-aware Sem-nCG metric and
demonstrate how this new metric can be used to evaluate model summaries against
multiple references. We also explore different ways of incorporating redundancy
into the original metric through extensive experiments. Experimental results
demonstrate that the new redundancy-aware metric exhibits a higher correlation
with human judgments than the original Sem-nCG metric for both single and
multiple reference scenarios
On Evaluation of Bangla Word Analogies
This paper presents a high-quality dataset for evaluating the quality of
Bangla word embeddings, which is a fundamental task in the field of Natural
Language Processing (NLP). Despite being the 7th most-spoken language in the
world, Bangla is a low-resource language and popular NLP models fail to perform
well. Developing a reliable evaluation test set for Bangla word embeddings are
crucial for benchmarking and guiding future research. We provide a
Mikolov-style word analogy evaluation set specifically for Bangla, with a
sample size of 16678, as well as a translated and curated version of the
Mikolov dataset, which contains 10594 samples for cross-lingual research. Our
experiments with different state-of-the-art embedding models reveal that Bangla
has its own unique characteristics, and current embeddings for Bangla still
struggle to achieve high accuracy on both datasets. We suggest that future
research should focus on training models with larger datasets and considering
the unique morphological characteristics of Bangla. This study represents the
first step towards building a reliable NLP system for the Bangla language1
Exploring Challenges of Deploying BERT-based NLP Models in Resource-Constrained Embedded Devices
BERT-based neural architectures have established themselves as popular
state-of-the-art baselines for many downstream NLP tasks. However, these
architectures are data-hungry and consume a lot of memory and energy, often
hindering their deployment in many real-time, resource-constrained
applications. Existing lighter versions of BERT (eg. DistilBERT and TinyBERT)
often cannot perform well on complex NLP tasks. More importantly, from a
designer's perspective, it is unclear what is the "right" BERT-based
architecture to use for a given NLP task that can strike the optimal trade-off
between the resources available and the minimum accuracy desired by the end
user. System engineers have to spend a lot of time conducting trial-and-error
experiments to find a suitable answer to this question. This paper presents an
exploratory study of BERT-based models under different resource constraints and
accuracy budgets to derive empirical observations about this resource/accuracy
trade-offs. Our findings can help designers to make informed choices among
alternative BERT-based architectures for embedded systems, thus saving
significant development time and effort